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Abstract. Algebraic specifications of data types provide a natural basis 

for testing data types implementations. In this framework, the confor- 

l_J ' mance relation is based on the satisfaction of axioms. This makes it 

Q, , possible to formally state the fundamental concepts of testing: exhaus- 

• ' five test set, testability hypotheses, oracle. 

fj ' Various criteria for selecting finite test sets have been proposed. They 

depend on the form of the axioms, and on the possibilities of observation 
of the implementation under test. This last point is related to the well- 
known oracle problem. As the main interest of algebraic specifications 
is data type abstraction, testing a concrete implementation raises the 
issue of the gap between the abstract description and the concrete repre- 
?^ , sentation. The observational semantics of algebraic specifications bring 

f*^ ■ solutions on the basis of the so-called observable contexts. 

After a description of testing methods based on algebraic specifications, 
the chapter gives a brief presentation of some tools and case studies, and 
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(-y-j ' presents some applications to other formal methods involving datatypes. 
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C^ ■ 1 Introduction 



Deriving test cases from some descriptions of the Implementation Under Test 
(the lUT) is a very old and popular idea. In their pioneering paper [3S], Good- 
enough and Gerhart pointed out that the choice of test cases should be based 
both on code coverage, and on specifications expressed by condition tables. One 
of the first papers where software testing was based on some formal description of 
the system under test, was by Chow [^: software was modelled by finite state 
machines. It has been very influential on all the subsequent works on testing 
based on formal specifications. 

Most approaches in this area are based on behavioural descriptions: for in- 
stance the control graph of the program, or some finite state machine or labelled 



transition system. In such cases, it is rather natural to base the selection of test 
scenarios on some coverage criteria of the underlying graph. 

Algebraic specifications are different: abstract data types are described in an 
axiomatic way |5I14I56J . There is a signature E, composed of a finite set S of sorts 
and a finite set F of function names over the sorts in S, and there is a finite set 
of axioms Ax. The correctness requirement is no more, as above, the ability (or 
the impossibility) for the lUT to exhibit certain behaviours: what is required by 
such specifications is the satisfaction of the axioms by the implementation of the 
functions of F. As a consequence, a natural way for testing some lUT is to choose 
some instantiations of the axioms (or of some consequences of them) and to check 
that when computed by the lUT, the terms occurring in the instantiations yield 
results that satisfy the corresponding axiom (or consequence). This approach 
was first proposed by Gannon et al. [32] j and Bouge et al. [15I16J . and then 
developed and implemented by Bernot et al. [TU] . 

Since these foundational works, testing from algebraic specifications has been 
investigated a lot. Numerous works have addressed different aspects. 

Some authors as in [6] or [23j focus on a target programming language (Ada or 
Haskell) . Testing from algebraic specifications has also been succesfully adapted 
for testing object-oriented systems J21|30I54] . Besides, methods inspired from 
algebraic testing have been applied to some other kinds of specifications like 
model-based specifications, first by Dick et al. [27], and more recently in [^ . 
Some other works explore links between test and proof |7I18I31J . 

Some tools [18 23 48 based either on resolution procedures or on specialised 
tactics in a proof engine, have been developed and used. 

Extensions of algebraic specifications have also been studied, for instance, 
bounded datatypes |4j or partial functions . More recently, some contributions 
|29I45I46] have been done to take into account structured or modular specifica- 
tions aiming at defining structured test cases and at modelling the activity of 
both unit testing and integration testing. 

Another special feature of algebraic specifications is the abstraction gap be- 
tween the abstract specification level and the concrete implementation. This 
raises problems for interpreting the results of test experiments with respect to 
the specification. This characteristic is shared with other formal methods that al- 
low the description of complex datatypes in an abstract way, for instance VDM, 
Z, and their object oriented extensions. 

As a consequence, in the area of testing based on algebraic specifications, 
a special emphasis has been put on the oracle problem |3I8I42I44I57J . The or- 
acle problem concerns the difficulty of defining reliable decision procedures to 
compare values of terms computed by the lUT. Actually, implementations of ab- 
stract data types may have subtle or complex representations, and the interface 
of the concrete datatypes is not systematically equipped with an equality proce- 
dure to compare values. In practice, only some basic datatypes provide a reliable 
decision procedure to compare values. They are said to be observable. The only 
way to define (partial) decision procedure for abstract data types is to observe 
them by applying some (composition of) functions yielding an observable result: 



they are called observable contexts. Observational approaches of algebraic spec- 
ifications bring solutions to define an appropriate notion of correctness taking 
into account observability issues. 

The chapter is organised as follows: Section [2] presents some necessary basic 
notions of algebraic specifications; Section [3] gives the basic definitions of test 
and test experiment against an algebraic specification; Section!?] introduces in a 
progressive way the notions of exhaustive test set and testability hypothesis in a 
simple case. Then Section [S] addresses the issue of the selection of a finite test 
set via the so-called uniformity and regularity selection hypotheses. Section [S] 
develops further the theory, addressing the case where there are observability 
problems: this leads to a reformulation of the definitions mentioned above, and 
to a careful examination of the notion of correctness. Section [7] presents some of 
the most significant related pieces of work. The last section is devoted to brief 
presentations of some case studies, and to the descriptions of some transpositions 
of the framework to some other formal methods where it is possible to specify 
complex data types. 



2 Preliminaries on algebraic specifications 

Algebraic specifications of data types, sometimes called axiomatic specifications, 
provide a way of defining abstract data types by giving the properties (axioms) 
of their operations. There is no explicit definition of each operation (no pre- 
and post-condition, no algorithm) but a global set of properties that describes 
the relationship between the operations. This idea comes from the late seventies 
|34I36| . It has been the origin of numerous pieces of work that have converged 
on the definition of Casl, the Common Algebraic Specification Language [14]. 

An example of an algebraic specification is given in Figure 1: it is a Casl 
specification of containers of natural numbers, i.e. a data structure that contains 
possibly duplicated numbers with no notion of order. This specification states 
that there are three sorts of values, namely Natural Numbers, Booleans and 
Containers. Among the operations, there is, for instance, a function named isin 
which, given two values, resp. of sort natural number and container, returns a 
boolean value. The operations must satisfy the axioms that are the formulas 
itemised by big bullets. 

The sorts, operation names, and profiles of the operations are part of the 
signature of the specification. The signature gives the interface of the specified 
data type. Moreover, it declares some sorted variables that are used for writing 
the axioms. 

An (algebraic) signature S — [S, F, V) consists of a set S of sorts, a set F of 
function names each one equipped with an arity in S* x S and a set of variables 
V, each of them being indexed by a sort. In the sequel, a function / with arity 
(si . . . s„, s), where si . . . s„, s G 5, will be noted / : si x . . . x s„ — > s. 

In Figure 1, the sorts of the signature are Nat and Bool (specified in some 
Our/Numbers/with/Bools specification, not given here), and Container; 
the functions are [] (the empty container), _ :: _ (addition of a number to a 



container), isin that checks for the belonging of a number to a container, and 
remove that takes away one occurrence of a number from a container; the vari- 
ables are x, y of Nat sort, and c of Container sort. 



from Our/Numbers/with/Bools version 0.0 get Nat, Bool 

spec Containers = 

Nat, Bool 
then 

generated type Container ::— [] | __::__(7Vai; Container) 

op isin : Nat x Container — * Bool 

op remove : Nat x Container — * Container 

V X, y: Nat; c: Container 

• isin{x, []) = false %(isin_empty)% 

• eq{x, y) = true => isin{x, y :: c) — true %(isin_l)% 

• eq{x, y) = false => isin{x, y :: c) = isin{x, c) %(isin_2)% 

• remove{x, []) = [] %(remove_einpty)% 

• eq{x, y) = true => remove{x, y :: c) — c %(remove_l)% 

• eq{x, y) = false => remo^;e(K, y :: c) = y :: remove{x, c) %(remove_2)% 
end 



Fig. 1. An algebraic specification of containers of natural numbers 



Given a signature E — (S,F,V), Ts{V) is the set of terms with variables 
in V freely generated from variables and functions in S and preserving arity of 
functions. Such terms are indexed by the sort of their result. We note Tx;{V)s 
the subset of Tx;{V) containing exactly those terms indexed by s. 

Te is the set rj;(0) of the ground terms and we note Ts.s the set of ground 
terms of sort s. 

Considering the Container specification, an example of a ground term t 
of Container sort is :: :: []. An example of a term t' with variables is 
zszn(x,0 :: c) that is of Bool sort. 

A substitution is any mapping p :V -^ Tx:{V) that preserves sorts. Substitu- 
tions are naturally extended to terms with variables. The result of the application 
of a substitution p to a term t is called an instantiation of t, and is noted tp. In 
the example, let us consider the substitution a : {x ^ 0,y ^ 0,c ^ y :: []}, the 
instantiation t'a is the term with variable isin(0,0 :: y :: []). 

S -equations are formulae of the form t = t' with t, t' G Ts{V)s for s G 5*. An 
example of an equation on containers is remove{x, []) — []. 

A positive conditional S-formula is any sentence of the form ai A . . . A a„ => 
a„+i where each aj is a I7-equation (1 < « < n + 1). Sen{E) is the set of all 
positive conditional Z'-formulae. 

A (positive conditional) specification SP — {S, Ax, C) consists of a signature 
E, a set Ax of positive conditional formulae often called axioms, and some 



constraints C, which may restrict the interpretations of the declared symbols 
(some examples are given below). When C is empty, we note SP = {S,Ax) 
instead of SP = (r, Ax, 0). 

Specifications can be structured as seen in the example: a specification SP 
can use some other specifications SPi, . . . , SPn- In such cases, the signature is 
the union of signatures, and there are some hierarchical constraints that require 
the semantics of the used specifications to be preserved (for more explanations 
see [55]). 

In the Containers specification, there are six axioms, named isin^empty, 
isinA, isinJ2, remove-empty, removeA, and removeJ2, and there is a so-called 
generation constraint, expressed at the line beginning by generated type, that 
all the containers are computable by composition of the functions [] and _ : : _. 
Such constraints are also called reachability constraints. The functions [] and 
_ :: _ are called the constructors of the Container type. 

In some algebraic specification languages, axioms can be formulae of first- 
order logic, as in Casl. However, in this chapter we mainly consider positive 
conditional specifications^. 

A S -algebra ,4 is a family of sets Ag, each of them being indexed by a sort; 
these sets are equipped, for each / : si x . . . x s„ ^ s G -F, with a mapping 
/•^ : Asi X ... X As,^ ^ As- A Z'-morphism /i from a i7-algebra A to a. E- 
algebra i3 is a mapping ji : A ^ B such that for all s G 5, /i(As) C Bs 
and for all / : si x . . . x s„ — > s £ F and all (ai, . . . , a„) G Ag^ x . . . x Ag^ 
/x(/-^(ai, . . . , an)) = f^{fi{ai), . . . , fi{an))- 

Alg{X!) is the class of all Z'-algebras. 

Intuitively speaking, an implementation of a specification with signature S 
is a Z'-algebra: it means that it provides some sets of values named by the sorts, 
and some way of computing the functions on these values without side effect. 

The set of ground terms T^ can be extended into a S-algehra by providing 
each function name / : six...xs„ — > s G F with an application /-^^ : 
(ii, . . . , tn) I— > f{ti, . . . , tn). In this case, the function names of the signature are 
simply interpreted as the syntactic constructions of the ground terms. 

Given a i7-algebra A, we note J^ : Ts -^ A the unique Z'-morphism that 
maps any /(ii, . . . , t„) to f'^{tf, . . . ,t^). A Z'-algebra A is said reachable if _^ 
is surjective. 

A X'-interpretation in A is any mapping l : V —> A. It is just an assignment 
of some values of the Z'-algebra to the variables. Given such an interpretation, 
it is extended to terms with variables: the value of the term is the result of its 
computation using the values of the variables and the relevant / . 

A S-algehra A satisfies a Z'-formula (p : Ai<i<„ ti = t[ ^ t ^ t' , noted 
A \^ if, ii and only if for every ^^-interpretation t in A, if for all i in l..n, 
i{ti) — L{t'j) then t(t) = i'{t'). Given a specification SP = {U, Ax, C), a Z'-algebra 



^ The reason is that most tools and case studies we present have been performed for 
and with this kind of specifications. An extension of our approach to first order logic, 
with some restrictions on quantifiers, was proposed by Machado in |45j . 



^ is a SP-algehra if for every ip S Ax^ A \= ^ and A fulfils the C constraint. 
Alg{SP) is the subclass of Alg{S) exactly containing all the 5'P-algebras. 

A Z'-formula (^ is a semantic consequence of a specification SP = {S^Ax), 
noted SP \= (p, if and only if for every ^P-algebra A, we have ^ ^ (p. 

3 Testing against an algebraic specification 

Let SP be a positive conditional specification and lUT be an Implementation 
Under Test. In dynamic testing, we are interested in the properties of the compu- 
tations by lUT of the functions specified in SP. lUT provides some procedures 
or methods for executing these functions. The question is whether they satisfy 
the axioms of SP. 

Given a ground i7-term t, we note t^^^ the result of its computation by lUT. 
Now we define how to test lUT against a i7-equation. 

Definition 1 (: Test and Test Experiment). Given a S -equation e, and 
lUT which provides an implementation for every function name of S , 

— a test for e is any ground instantiation t ^ t' of e; 

— a test experiment of lUT against t — t' consists in the evaluation of t^^'^ 
and t and the comparison of the resulting values. 

Example 1. One test of the isin.empty equation in the Containers specifica- 
tion of Figure 1 is zsm(0, []) — false. 

The generalization of this definition to positive conditional axioms is straight- 
forward. 

In the following, we say that a test experiment is successful if it concludes 
to the satisfaction of the test by the lUT, and we note it lUT passes r where 
T is the test, i.e. a ground formula. We generalise this notation to test sets: 
lUT passes TS means that Vr G TS, lUT passes r. 

Deciding whether lUT passes t is the oracle problem mentioned in the 
introduction. In the above example it is just a comparison between two boolean 
values. However, such a comparison may be difficult when the results to be 
compared have complex data types. We postpone the discussion on the way it 
can be realised in such cases to Section [6] Actually, we temporarily consider in 
the two following sections that this decision is possible for all sorts, i.e. they are 
all "observable" . 

Remark 1. Strictly speaking, the definition above defines a tester rather than a 
test data: a test t = t' \s nothing else than the abstract definition of a program 
that evaluates t and t' via the relevant calls to the lUT and compares the results; 
a test experiment is an execution of this tester linked to the lUT. 

We can now introduce a first definition of an exhaustive test of an implemen- 
tation against an algebraic specification. A natural notion of correctness, when 
all the data types of the specification are observable, is that the lUT satisfies 
the axioms of the specification. Thus we start with a first notion of exhaustive 
test inspired from the notion of satisfaction as defined in Section [2j 



4 A first presentation of exhaustivity and testabifity 

Definition 2 (: Exhaustive Test Set, first version). Given a positive condi- 
tional specification S P = {S,Ax), an exhaustive test set for SP, noted Exhaustsp, 
is the set of all well-sorted ground instantiations of the axioms in Ax: 

Exhaustsp = {4>p I (j) e Ax, p G V ^ Ts} 

An exhaustive test experiment of some lUT against SP is the set of all the 
test experiments of the lUT against the formulas of Exhaustsp- 

As said above, this definition is very close to (and is derived from) the notion 
of satisfaction of a set of Z'-axioms by a Z'-algebra. In particular, the fact that 
each axiom can be tested independently comes from this notion. 

However, an implementation's passing once all the tests in the exhaustive 
test set does not necessarily mean that it satisfies the specification: first, this 
is true only if the lUT is deterministic; second, considering all the well-sorted 
ground instantiations is, a priori, not the same thing as considering all the S- 
interpretations in the values of the lUT. It may be the case that some values are 
not expressible by ground terms of the specification. 

In other words, the above test set is exhaustive with respect to the specifica- 
tion, but may be not with respect to the values used by the program. Thus some 
testability hypotheses on the implementation under test are necessary: the suc- 
cess of the exhaustive test set ensures the satisfaction of the specification by the 
implementation only if this implementation behaves as a reachable Z'-algebra 
(cf. Section O. 

Practically, it means that: 

— There is a realisation of every function of S that is supposed to be deter- 
ministic; the results do not depend on some hidden, non specified, internal 
state. 

— The implementation is assumed to be developed following good programming 
practices; any computed value of a data type must always be a result of the 
specified operations of this data type. 

— There is a comparison procedure for the values of every sort of the signature. 

Note that, explicitly or not, all testing methods make assumptions on lUT: 
a totally erratic system, or a diabolic one, may pass some test set and fail later 
on'^. In our case these hypotheses are static properties of the program. Some 
of them are (or could be) checkable by some preliminary static analysis of the 
source code. 



^ Testing methods based on Finite State Machine descriptions rely on the assumption 
that tiie lUT behaves as a FSM witli the same number of states as the specification; 
similarly, metliods based on lO-automata or lO-Transition Systems assume that tlie 
TUT behaves as an lO-automata: consequently, it is supposed input-enabled , i.e. 
always ready to accept any input. 



Definition 3 (: Z"- Testability) . Given a signature S, an lUT is S -testable if 
it defines a reachable S -algebra AijjT ■ Moreover, for each r of the form t = t' , 
there exists a way of deciding whether it passes or not. 

The S-testability of the lUT is called the minimal hypothesis Hmin on the 
lUT. 

Let us note Correct{IUT, SP) the correctness property that a given lUT 
behaves as a reachable S'P-algebra (i. e. the axioms are satisfied and all the 
values are specified). The fundamental link between exhaustivity and testability 
is given by the following formula: 

Hmin{IUT) ^ (Vr e Exhaustsp, lUT passes t ^ Correct{IUT, SP)) 

Exhaustsp is obviously not usable in practice since it is generally infinite. Actu- 
ally, the aim of the definitions of Exhaustsp and i?mm is to provide frameworks 
for developing theories of black-box testing from algebraic specifications. Practi- 
cal test criteria (i.e. those which correspond to finite test sets) will be described 
as stronger hypotheses on the implementation. This point is developed in Sec- 
tions E] and [5] 

Before addressing the issue of the selection of finite test sets, let us come back 
to the definition of Exhaust gp. As it is defined, it may contain useless tests, 
namely those instantiations of conditional axioms where the premises are false: 
such tests are always successful, independently of the fact that their conclusion 
is satisfied by the lUT or not. Thus they can be removed. 

Example 2. Assuming that eq{Q, 0) = true is a semantic consequence of the 
Our/Numbers/with/Bools specification, we can derive an equational test 
for the removeA conditional axiom in the Containers specification of Figure 
1. This test is simply the ground equation: 
remove{0,0 :: 0:: []) = :: []. 

In the example of Figure 1, we have distinguished a subset of functions as 
constructors of the Container type (namely [] and ::). Under some conditions, 
the presence of constructors in a specification makes it possible to characterise 
an equational exhaustive test set. 

A signature with constructors is a signature S =< S, F,V > such that a sub- 
set C of elements of F are distinguished as constructors. Let us note f2 —< S, C, 
V > the corresponding sub-signature of S, and Tq the corresponding ground 
terms. A specification SP =< S, Ax > where Z" is a signature with construc- 
tors C is complete with respect to its constructors if and only if both following 
conditions hold: 

- VteTs, 3t' e Tn such that SP^t = t' 

- yt,t' eTn,SP [^ t = t' ^ < S,<I} >^ t = t', i.e. t and t' are syntactically 
identical 

Example 3. The Containers specification of Figure 1 is complete with respect 
to the constructors C = {[],::} of the Container sort: from the axioms, any 



ground term of Container sort containing some occurrence of the (non construc- 
tor) remove function is equal to some ground term containing only occurrences 
of [] and ::. Moreover, there is only one such ground term. 

For such specifications and under some new hypotheses on the lUT, it is 
possible to demonstrate that the set of ground conclusions of the axioms is 
exhaustive. When removing premises satisfied by the specification, we should be 
careful not to remove some other premises that the lUT could interpret as true, 
even if they are not consequences of the specification. A sufRcient condition is 
to suppose that the lUT correctly implements the constructors of all the sorts 
occurring in the premises. Let us introduce the new testability hypothesis Hminfi 
for that purpose. Intuitively, Hminfi means that the lUT implements data types 
with a syntax very close to their abstract denotation. It may seem to be a strong 
hypothesis, but in fact, it only applies to basic types, often those provided by 
the implementation language. As soon as the data type implementation is subtle 
or complex, the data type is then encapsulated and thus considered as non 
observable for testing (cf. Section [6|). 

Definition 4. lUT satisfies Hmin,c iff lUT satisfies Hmin o-nd : 

Vs e S, Vu, V e T^2,s, lUT passes u ^ v ^ SP \= u = v 

Definition 5. 

EqExhaustspfi — {ep\ 3ai A . . . A «„ ^ e G Ax, 

pel/->T^,5Ph(«iA...Aa„)p} 

Under Hminfi and for specifications complete with respect to their construc- 
tors EqExhaustspfi is an exhaustive test set. A proof can be found in [41j or 
in [1]. Its advantage over Exhaust sp is that it is made of equations. Thus the 
test experiments are simpler. 

Some other approaches for the definitions of exhaustivity and testability are 
possible. For instance, as suggested in [Tl] and applied by Dong and Frankl in 
the ASTOOT system [W, a different possibility is to consider the algebraic spec- 
ification as a term rewriting system, following a "normal-form" operational se- 
mantics. Under the condition that the specification defines a ground-convergent 
rewriting system, it leads to an alternative definition of the exhaustive test set: 

Exhaust's p = {t = ti\t(^Ts} 

where i J, is the unique normal form of i. The testability hypothesis can be weak- 
ened to the assumption that the lUT is deterministic (it does not need anymore 
to be reachable). In [30], an even bigger exhaustive test set was mentioned (but 
not used), which contained for every ground term the inequalities with other 
normal forms, strictly following the definition of initial semantics. 

Actually, this is an example of a case where the exhaustive test set is not 
built from instantiations of the axioms, but more generally from an adequate 
set of semantic consequences of the specification. Other examples are shown in 
Section [6l 



5 Selection hypotheses: uniformity, regularity 

5.1 Introduction to selection hypotheses 

A black-box testing strategy can be formalised as the selection of a finite subset 
of some exhaustive test set. In the sequel, we work with EqExhaustsp.c, but 
what we say is general to the numerous possible variants of exhaustive test sets. 
Let us consider, for instance, the classical partition testing strategy"^. It con- 
sists in defining a finite collection of (possibly non-disjoint) subsets that covers 
the exhaustive test set. Then one element of each subset is selected and submit- 
ted to the implementation under test. The choice of such a strategy corresponds 
to stronger hypotheses than Hmin on the implementation under test. We call 
such hypotheses selection hypotheses. In the case of partition testing, they are 
called uniformity hypothesis, since the implementation under test is assumed to 
uniformly behave on some test subsets UTSi (as Uniformity Test Subset): 

UTSi U . . . U UTSp = EqExhaustsp.Ci and 
\/i = 1, . . . ,p, (Vr e UTSi,IUT passes t => lUT passes UTSi) 

Various selection hypotheses can be formulated and combined depending on 
some knowledge of the program, some coverage criteria of the specification and 
ultimately cost considerations. Another type of selection hypothesis is regularity 
hypothesis, which uses a size function on the tests and has the form "if the subset 
of EqExhaustspfi rnade up of all the tests of size less than or equal to a given 
limit is passed, then EqExhaustsp,c also is"''. 

All these hypotheses are important from a theoretical point of view because 
they formalise common test practices and express the gap between the success 
of a test strategy and correctness. They are also important in practice because 
exposing them makes clear the assumptions made on the implementation. Thus, 
they give some indication of complementary verifications, as used by Tse et al. 
in [20]. Moreover, as pointed out by Hierons in [39], they provide formal bases 
to express and compare test criteria and fault models. 

5.2 How to choose selection hypotheses 

As said above, the choice of the selection hypotheses may depend on many fac- 
tors. However, in the case of algebraic specifications, the text of the specification 
provides useful guidelines. These guidelines rely on coverage of the axioms and 
composition of the cases occurring in premise of the axioms via unfolding as 
stated first in [TU], and extended recently in [T]. 



^ more exactly, it should be called sub-domain testing strategy. 

"* As noticed by several authors, [30], |20] . and from our own experience [^, such 
hypotheses must be used with care. It is often necessary to choose this limit taking in 
consideration some "white-box knowledge" on the implementation of the datatypes: 
array bounds, etc 



We recall that axioms are of the form ai A . . . A a„ =^ ctn+i where each a^ is 
a Z'-equation ti ^ t'^, {1 < i < n + 1). 

From the definition of EqExhaustsp.c, a test of such an axiom is some Un+ip 
where p aV ^ Ts is a well-typed ground substitution of the variables of the 
axiom such that the premise of the axiom, instantiated by p, is true: it is a 
semantic consequence of the specification [SP |= (ai A ... A Q;„)p). 

One natural basic testing strategy is to cover each axiom once, i. e. to choose 
for every axiom one adequate substitution p only. The corresponding uniformity 
hypothesis is 

Vp G 1/ ^ Ts such that SP |= (ai A ... A an)jO, lUT passes an+ip ^ 
{lUT passes Q;„+ip',V/9' gV^ T^ such that SP ^ (ai A ... A a„)p') 

It defines a so-called uniformity sub-domain for the variables of the axiom 
that is the set of ground Z'-terms characterised by SP ^ (ai A ... A a„). 

Exam,ple 4- In the example of Figure 1, covering the six axioms requires six tests, 
for instance the following six ground equations: 

— isin{0, []) = false, with the whole Nat sort as uniformity sub-domain; 

— isin{l, 1 :: 2 :: []) = true, with the pairs of Nat such that eq{x,y) = true 
and the whole Container sort as uniformity sub-domain; 

— isin{l, :: 3 :: []) = isin{l, 3 :: []), with the pairs of Nat such that eq{x, y) = 
false and the whole Container sort as uniformity sub-domain; 

— remove[\, []) = [], with the Nat sort as uniformity sub-domain; 

— remove[Q,Q :: 1 :: []) = 1 :: [], with the pairs of iVai such that eg(x, y) = true 
and the Container sort as uniformity sub-domain; 

— remove{l,Q :: []) = :: remove{l,W), with the pairs of Nat such that 
eq{x, y) = false and the Container sort as uniformity sub-domain. 

Such uniformity hypotheses are often too strong. A method for weakening 
them, and getting more test cases, is to compose the cases occurring in the 
axioms. In the full general case, it may involve tricky pattern matching on the 
premises and conclusions, and even some theorem proving. However, when the 
axioms are in a suitable form one can use the classical unfolding technique defined 
by Burstall and Darlington in [I9i. It consists in replacing a function call by its 
definition. Thus, for unfolding to be applicable, the axioms must be organised 
as a set of functions definitions: every function is defined by a list of conditional 
equations such as: 

Al<i<m OLi => f{ti, . . . ,tn) = t 

where the domain of the function must be covered by the disjunction of the 
premises of the list. 

Example 5. In the example of Figure 1, the isin function is defined by: 



• isin{x, []) = false 






%(isin_empty)% 


• eq{x, y) = true => isin{x, y : 


:c). 


- true 


%(isin.l)% 


• eq{x, y) — false => isin{x, y : 


:c) = 


= isin{x, c) 


%(isin.2)% 



It means that every occurrence of isin{ti,t2) can correspond to the three 
following sub-cases: 

— ^2 = []: in this case isin(ti,t2) can be replaced by false; 

— t2 ^ y '■'■ c and eq{ti,y) = true: in this case, it can be replaced by true; 

^ t2 — y '■'■ c and eq{ti,y) — false: in this case, it can be replaced by y :: 
isin{tl,c). 

A way of partitioning the unifornrity sub-domain induced by the coverage of 
an axiom with some occurrence of f{ti, . . . ,tn) = i is to introduce the sub- 
cases stated by the definition of /, and, of course, to perform the corresponding 
replacements in the conclusion equation to be tested. This leads to a weakening 
of the uniformity hypotheses. 

Example 6. Let us consider the isinJl axiom. Its coverage corresponds to the uni- 
formity sub-domain "pairs of Nat such that eq{x, y) = false'' x "the Container 
sort". Let us unfold in this axiom the second occurrence of isin, i.e. isin{x,c). 
It leads to three sub-cases for this axiom: 

-c=[]: 

eq{x, y) — false A c = [] ^ isin{x, y :: W) — isin{x, []), i.e, false; 

— c = y' :: c' and eq{x, y') = true : 

eq{x,y) = false Ac — y' :: c' A eq{x,y') — true => isin{x,y :: y' :: c') = 
isin{x,y' :: c'), i.e., true; 

— c = y' :: c' and eq{x, y') = false : 

eq(x, y) = false A c = y' :: c' A eq{x, y') = false => isin{x, y :: y' :: c') — y :: 
isin{x,y' :: c'), i.e. isin{x,c'). 

The previous uniformity sub-domain is partitioned in three smaller sub-domains 
characterised by the three premises above. Covering these sub-cases leads to test 
bigger containers, and to check that isin correctly behaves independently of the 
fact that the searched number was the last to be added to the container or not. 
Applying the same technique to the removeJZ axiom leads to test that in case 
of duplicates, one occurrence only is removed. 

Of course, unfolding can be iterated: the last case above can be decomposed 
again into three sub-cases. Unbounded unfolding leads generally to infinite test 
sets^. Limiting the number of unfoldings is generally sufficient for ensuring the 
finiteness of the test set. Experience has shown (see Section [5]) that in practice 
one or two levels of unfolding are sufficient for ensuring what test engineers 
consider as a good coverage and a very good detection power. In some rare 
cases, this limitation of unfolding does not suffice for getting a finite test set: 
then, it must be combined with regularity hypotheses, i. e. limitation of the size 
of the ground instantiations. 



Actually, as it is described here, unbounded unfolding yields an infinite set of equa- 
tions very close to the exhaustive test set. The only remaining variables are those 
that are operands of functions without definitions, namely, in our case, constructors 



Unfolding has been implemented by Marre within the tool LOFT [10I47I48J 
using logic programming. There are some conditions on the specifications ma- 
nipulated by LOFT: 

— they must be complete with respect to constructors; 

— when transforming the specification into a conditional rewriting system (by 
orienting each equation t = t' occuring in an axiom from left to right t — > t'), 
the resulting conditional rewrite system must be confluent and terminating; 

— each equation t = t' that is the conclusion of an axiom must be such that t 
may be decomposed as a function /, not belonging to the set of constructors, 
applied to a tuple of terms built on constructors and variables only. 

Under these conditions, the LOFT tool can decompose any uniformity domain 
into a family of uniformity sub-domains. It can also compute some solutions into 
a given uniformity sub-domain. These two steps correspond respectively to the 
computation of the uniformity hypotheses based on unfolding subdomains and 
to the generation of an arbitrary test case per each computed subdomain. The 
unfolding procedure is based on an equational resolution procedure involving 
some unification mechanisms. Under the conditions on the specifications given 
above, the unfolding procedure computes test cases such that: sub-domains are 
included in the domain they are issued from (soundness), and the decomposition 
into subdomains covers the splitted domain (completeness). 

In [T] , Aiguier et al. have extended the unfolding procedure for positive con- 
ditional specifications without restrictions. This procedure is also sound and 
complete. However, the price to pay is that instead of unfolding a unique oc- 
currence of a defined function, the extended unfolding procedure requires to 
unfold all occurrences of the defined functions in a given equation among all the 
equations characterising the domain under decomposition. This may result in 
numerous test cases. 

We have seen that conditional tests can be simplified into equational ones 
by solving their premises. It can be done in another way, replacing variables 
occurring in the axiom by terms as many times as necessary to find good instan- 
tiations. This method amounts to draw terms as long as the premises are not 
satisfied. This is particularly adapted in a probabilistic setting. In [9 , Bouaziz 
et al. give some means to build some distributions on the sets of values. 

6 Exhaustivity and testability versus observability 

Until now, we have supposed that a test experiment t — t' oi the lUT may be 
successful or not depending on whether the evaluations of t and t' yield the same 
resulting values. Sometimes, comparing the test outputs may be a complex task 
when some information is missing. It often corresponds to complex abstract data 
types encapsulating some internal concrete data representations. Some abstract 
data types (sets, stacks, containers, etc) do not always provide an equality proce- 
dure within the implementation under test and we reasonnably cannot suppose 



the existence of a finite procedure, the oracle, to correctly interpret the test re- 
sults as equalities or inequalities. The so-called oracle problem in the framework 
of testing from algebraic specifications amounts to deal with equalities between 
terms of non observable sorts. 

In this section, we distinguish a subset Sobs of observable sorts among the 
set S of all sorts. For example, it may regroup all the sorts equipped with an 
equality predicate within the lUT environnement, for instance equality predi- 
cates provided by the programming language and considered as reliable. The 
minimal hypothesis Hmin is relaxed to the weaker hypothesis H^^^ expressing 
that the the lUT still defines a reachable Z'-algebra but that the only remaining 
elementary tests which may be interpreted by the lUT as a verdict success/failure 
are the ground equality t — t' oi observable sort. The set Obs of all observable 
formulae is the subset of Sen{S) of all formulae built over observable ground 
equalities. Any formula of Obs may be considered as a test experiment, and 
reciprocally. 

The oracle problem in the case of non observable sorts may be tackled by two 
distinct but related questions. How to turn non observable equalities under test 
into test experiments tractable by an lUT only satisfying H^^^fJ How far can 
we still talk about correctness when dealing with observability issues? Roughly 
speaking, the answers lie respectively in using observable contexts and in defining 
correctness up to some observability notion. We present these two corresponding 
key points in the following sections. 

6.1 Observable contexts 

In practice, non observable abstract data types can be observed through suc- 
cessive applications of functions leading to an observable result. It means that 
properties related to non observable sorts can be tested through observable con- 
texts : 

Definition 6 (: Context and Observable context). 

An observable context c for a sort s is a term of observable sort with a unique 
occurrence of a special variable of sort s, generically denoted by z. 

Such a context is often denoted c[z] or simply c[.] and c[t] denotes ca where 
a is the substitution associating the term t to the variable z. 

An observable context is said to be minimal if it does contain an observable 
context as a strict subterrri" . 

Only minimal observable contexts are meaningful for testing. Indeed, if a 
context c has an observable context c' as a strict subterm, then c[z\ may be 
decomposed as co[c'[z]]. It implies that for any terms t and t' , c[t] ~ c[t'] iff 
c'[t\ — c'[t']. Both equalities being observable, the simpler one, c'[t] = c'[t'], 



^ A subterm of a term f is f itselt or any term occurring in it. In particular, if t is of 
form /(ti, . . . , in) then ti, ... and f„ are subterms of t. A strict subterm of t is any 
subterm of t which differs from t. 



suffices to infer whether c[t] = c[t'] holds or not. In the sequel, all the observable 
contexts will be considered as minimal by default. 

For example, we can use set cardinality and element membership to observe 
some set data type as well as the height and the top of all successive popped 
stacks for some stack data type. Thus, a non observable ground equality of the 
form t = t' is observed through all observable contexts c[.] applied to both t and 
t' . From a testing point of view, it amounts to apply to both terms t and t' the 
same successive application of operations yielding an observable value, and to 
compare the resulting values. 

Example 7. With the Containers specification of Figure 1, we now consider 
that the sort Container is no more an observable sort while Nat and Bool are 
observable ones. Ground equalities of sort Container should be observed through 
the observable sorts Nat and Bool. An abstract test like remove{3, []) = [] 
is now observed through observable contexts. Each observable context of sort 
Container gives rise to a new (observable) test belonging by construction to 
Obs. For example, the context isin{S, z) applied to the previous abstract test 
leads to the test : isin{3,remove{3, [])) = isin{3, []). 

In practice, there is often an infinity of such observable contexts. In the case 
of the Containers specification, we can build the following observable contexts'" 

isin{x, Xi :: {x2 :: . . . (a;„ :: z))), isin{x, remove{xi,remove{x2, ■ ■ ■ , remove{xn, z)))) 

or more generally, any combination of the operations remove and :: sur- 
rounded by the isin operation. As a consequence, we are facing a new kind of 
selection problem: to test an equality t = t' oi Container sort, one has to select 
among all these observable contexts a subset of finite or even reasonable size. 

Bernot in [8] gives a counter-example based on the stack data type to assess 
that without additional information on the lUT, all the contexts are a priori 
necessary to test a non observable equality, even those involving constructors 
such as ::. More precisely, a context of the form isin{x, xi :: z) may appear useless 
since it leads to build larger Container terms instead of observing the terms 
replacing z. In [8:, it is shown that those contexts may reveal some programming 
errors depending on a bad use of state variables. From a theoretical point of 
view, let us consider a specification reduced to one axiom a — b expressing that 
two non observable constants are equal. Then for any given arbitrary minimal 
context Co, one can design a program Peg making c[a\ = c[b] true for all minimal 
observable contexts except cq. This fact means that in general, any minimal 
context is needed to "fully" test non observable equalities. This is a simplified 
explanation of a proof given by Chen,Tse et al. in [5D] . 

Let us point out that replacing an equation i = i' by the (infinite) set of 
c[t] = c[t'] with c an observable context is classical within the community of al- 
gebraic specifications. Different observational approaches [13153] have been pro- 
posed to cope with refinement of specifications based on abstract data types. 



^ For convenience, we use the variables x, xi, . . . , Xn to denote arbitrary ground terms 
of sort Nat in a concise way. 



They have introduced the so-called behavioural equalities, denoted by t w t'. 
The abstract equality is replaced by the (infinite) set of all observables contexts 
applying to both terms. More precisely, an algebra A satisfies i « i' if and 
only if for every Z'-interpretations t in A, for all observable contexts c, we have 
(.(c[i]) — i{c[t']). Behavioural equalities allow the specifier to refine abstract data 
types with concrete data types that do not satisfy some properties required at 
the abstract level. For example, the Set abstract data type with some axioms 
stating the commutativity of the element insertion, can be refined into the List 
abstract data type where the addition of an element by construction cannot be 
commutative. The refinement of Set by List is ensured by requiring that equal- 
ities on sets hold in the list specification only up to the behavioural equality. It 
amounts to state that observable operations (here the membership operation) 
behave in the same way at the abstract level of sets and at the implementation 
level of lists and to ignore those properties of the implementation that arc not 
observable. 

Considering an infinity of contexts is possible using context induction as 
defined by Hennicker in [35] ■ This is useful to prove a refinement step, but is 
useless in order to define an oracle. So, how can we select a finite set of observable 
contexts? Below we give some hints: 

— The selection hypotheses presented in Section [5] to choose particular instan- 
tiations of axiom variables can be transposed to choose observable contexts. 
In particular, a rather natural way of selecting contexts consists in applying 
a regularity hypothesis. The size of a context is often defined in relation with 
the number of occurrences of non observable functions occurring in it. 

— If one can characterise the equality predicate by means of a set of axioms, 
then one can use this axiomatisation, as proposed by Bidoit and Hennicker 
in [12], to define the test of non observable equalities. To give an intuition 
of how such an axiomatisation looks like, we give below the most classical 
one. It concerns the specification of abstract data types like sets, bags or 
containers, for which two terms are equal if and only if they exactly contain 
the same elements. Such an axiomatisation looks like: 

c « c' iff Ve, isin{e, c) — isin{e, c') 

where c and c' are variables of the abstract data type to be axiomatised, and 
e is a variable of element sort, c k, (^ denotes the behavioural equality that 
is axiomatised. The axiomatisation simply expresses that the subset of con- 
texts of the form isin(e, z) suffices to characterize the behavioural equality. 
This particular subset of contexts can then be chosen as a suitable starting 
point to select observable contexts to test non observable equalities. Such 
an approach has two main drawbacks. First, such a finite aximatisation may 
not exist^ or be difficult to guess. Second, selecting only from the subset 
of observable contexts corresponding to a finite axiomatisation amounts to 
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For example, the classical stack specification has no finite axiomatisation of stack 
equality. 



make an additional hypothesis on the lUT, which has been called the ora- 
cle hypothesis in [5]. In a few words, it consists in supposing that the lUT 
correctly implements the data type with respect to the functions involved in 
the axiomatisation. In the example of Containers, two containers are sup- 
posed to be behaviourally equal if and only if the membership operation isin 
applied on the containers always gives the same results. In other words, by 
using axiomatisation to build oracles, we are exactly supposing what we are 
supposed to test. Clearly, it may appear as a too strong hypothesis. 
— Chen, Tse and others in ^D] point out that some static analysis of the 
lUT may help to choose an adequate subset of observable contexts. When 
testing whether t = t' holds or not, the authors compare their internal 
representations r and r' within the lUT. If r and r' are equal, then they 
can conclude^ that the lUT passes t — t' . Otherwise, if r and r' are not 
equal, then they study which data representation components are different 
in r and r' and which are the observations which may reveal the difference. 
This makes it possible to build a subset of observable contexts which has 
a good chance to observationally distinguishes t and t'. The heuristic they 
have proposed has been successfully applied in an industrial context [55j . 

6.2 Correctness with observability issues 



We have seen in Section 16.11 that the test of a non observable equality may 
be approached by a finite subset of observable contexts. More precisely, a non 
observable ground equality t — t' may be partially verified by submitting a finite 
subset of the test set: 

Obs{t — t') — {c[t\ — c[t'] I c is a minimal observable context} 



The next question concerns testability issues : can we adapt the notions of 
correctness and exhaustivity when dealing with observability ? For example, one 
may wonder whether the set Obs(t — t') may be considered as an exhaustive test 
set for testing the non observable (ground) equality t = t' . More generally, by 
taking inspiration from the presentation given in Sectional we look for a general 
property linking the notions of exhaustive test set and testability such as: 

iJ^^^(/f/r) ^ (Vt e Exhaust^l^IUT passes t ^ Correct'=""'{IUT, SP)) 

6.2.1 Equational specifications 

If SP is an equational specification^*', then following Section [O] the test set 



^ [1^ is partially based on this same idea : if the concrete implementations are iden- 
tical, then necessarily their corresponding abstract denotations are equal terms. 
^° Axioms of an equational specification are of the form t — t' where t and t' are terms 
with variables and of the same sort. 



Exhaust^^p^ = { c[t]p = c[t']p \ t = t' e Ax, p e V ^ Ts, 
c minimal observable context} 

is a good candidate"'^'^ since it simply extends the Obs{t = t') sets to the 
case of equations with variables. Actually, Exhaust'g^ is an exhaustive test set 
provided that we reconsider the definition of correctness taking into account 
observability. 

By definition of observability, the lUT does not give access to any information 
on non observable sorts. Considering a given lUT as correct with respect to 
some specification SP should be defined up to all the possible observations and 
by discarding properties directly expressed on non observable sorts. Actually, 
observational correctness may be defined as : lUT is observationally correct 
with respect to SP according to the set of observations Ohs, if there exists an 
S'P-algebra A such that lUT and A exactly behave in the same way for all 
possible observations. 

To illustrate, let us consider the case of the Container specification enriched 
by a new axiom of commutativity of element insertions: 

X :: [y :: c) ^ y :: (x :: c) 

The Container datatype is classically implemented by the List data type. How- 
ever, elements in lists are usually stored according to the order of their insertion. 
In fact, the List data type is observationally equivalent to the Container data 
type as soon as the membership element is correctly implemented in the List 
specification. It is of little matter whether the List insertion function satisfies or 
not the axioms concerning the addition of elements in Containers. 

This is formalised by introducing equivalence relations between algebras de- 
fined up to a set of Z'-formulae. 

Definition 7. Let 9 C Sen{S) and A and B he two S-algehras. 

A is said to be ^-equivalent to B, denoted by A =ip B, if and only if we have 

yLpc^,A^ip^^^B\=Lp. 

A is said to he observationally equivalent to B if and if A =obs B. 

We can now define observational correctness: 

Definition 8. Let lUT be an implementation under test satisfying H^^^. 

lUT is observationally correct with respect to SP and according to Obs, 
denoted by Correct'^^^ {lUT, SP) if and only if 

3A reachable 5P-algebra, lUT =obs A 



^^ Let us remark that if t and t' are of observable sort s, then the only minimal ob- 
servable context is Zs such that tp — t' p are the unique tests associated to the axiom 
t = t'. 



Remark 2. This notion of observational correctness has been first recommended 
for testing purpose by Le Gall and Arnould in |41I42| for a large classe of spec- 
ifications and observations^^. With respect to the observational approaches in 
algebraic specifications 13J, it corresponds to abstractor specifications for which 
the set of algebras is defined as the set of all algebras equivalent to at least 
an algebra of a kernel set, basically the set of all algebras satisfying the set of 
axioms. 

From a testing point of view, each reachable S'P-specification is obviously ob- 
servationally correct with respect to SP. Reciprocally, an implementation lUT is 
observationally correct if it cannot be distinguished by observations from at least 
a reachable 5'P-algebra, say lUTsp- So, nobody can say whether the implemen- 
tation is the 5P-algebra lUTsp, and thus intrinsically correct, or the lUT is just 
an approximation of one reachable I7-algebra up to the observations Obs. Thus, 
under the hypothesis -ff^i^, any observationally correct lUT should be kept. 
Finally, Correct^'"' {lUT, SP) captures exactly the set of all implementations 
which look like 5P-algebras up to the observations in Obs. With this appropri- 
ate definition of Correct^^" {lUT, SP), the test set Exhaustg'^ is exhaustive. 
A sketch of the proof is the following. For each lUT passing Exhaust gp , let us 
consider the quotient algebra Q built from lUT with the axioms of SP. We can 
then show that Q is a SP-algebra and is observationally equivalent to lUT. 

6.2.2 Positive conditional specifications vifith observable premises 

We also get an exhaustive test set when considering axioms with observable 
premises. For each axiom of the form ai A ... A an => t = t' with all ai of 
observable sort, it suffices to put in the corresponding exhaustive test set all the 
tests of the form aiph. . . A a„/9 ^ c[t]p = c[t']p for all substitutions p : V ^ Ts 
and for all minimal observable contexts c. 

Moreover, if we want to have an exhaustive test set involving equations only, 
as it has been done in Section [H we should restrict to specifications with ob- 
servable premises and complete with respect to the set Cobs of constructors of 
observable sorts. As in Section |4l we also consider that the fUT correctly im- 
plements the constructors of all the sorts occurring in the premise, here the 
observable sorts^^. That is to say, lUT satisfies Hminfiobs iff -^C^^ satisfies Hmin 
and : 

Vs G Sobs,'^u,v G T(2,sIUT passes u — v ^ SP ^ u — v 

Under Hminfiobs ^^"^ ^^r the considered restricted class of specifications (i.e. 
observable premises and completeness with respect to Cobs), 



^■^ For interested readers, [1UI41I42] give a generic presentation of formal testing from 
algebraic specifications in terms of institutions. 

^^ When observable sorts coincide with the basic data types of the programming lan- 
guage, such an hypothesis is quite plausible. Thus, this is a weak hypothesis 



EqExhaust^^j^ = {c[t]p = c[t']p \3ai A . . . ^a.n ^ t = t' ^ Ax, p ^V ^ Ts, 
c min. obs. context, SP |= (ai A ... A a„)p} 

is an exhaustive test set with respect to observational correctness. 

6.2.3 Generalisation to non-observable premises 

Is it possible to generalise such a construction of an exhaustive test set 
for specifications with positive conditional formulas comprising non-observable 
premises? A first naive solution would consist in replacing each non-observable 
equation t = t' occuring either in the premise or in the conclusion of the axioms 
by a subset of Ohs{t = t'). Unfortunately, such an idea cannot be applied, unless 
one accepts to submit biased tests^"*. This fact has been reported by Bcrnot and 
others in [8I10J . To give an intuition, let us consider a new axiom 

X :: X :: I — X :: I ^ true = false 

which means that if addition to a container is idcmpotent, then^^ it would lead 
to true = false. Let us try to test the ground instance :: :: [] = :: [] => 
true = false by considering a test (f) in Obs of the form 



A 



i/ji 



^pi e Obs{0 :: ;: = 
i £ 1,1 finite index 



then the lUT may pass the premise 



A 



■0i 

ij, € Obs{0 :: :: D = :: []) 
i £ 1,1 finite index 

without :: :: [] = :: :: [] being a consequence of the specification. In 
that case, the lUT passes the test </> by passing the conclusion true = false. 
Thus, observing non observable premises through a finite set of contexts leads 
to require an observable equality, here true — false, which in fact is not required 
by the specification. This is clearly a bad idea. 

It is now widely recognised that non-observable equations may be observed 
through some subset of observable contexts only when their position in the test 
is positive^^. For example, the disjunctive normal form of :: :: [] = :: [] => 



A test is said to be biased when it rejects at least a correct implementation. 
This is no more than a positive conditional way of specifying x :: x :: I ^ x :: I. 
Actually, as the trivial algebra (with one element per sort) is satisfying all the con- 
ditional positive specifications, the inconsistency of specifications is often expressed 
by the possibility of deriving the boolean equation true — false. 
Roughly speaking, an atom f = i' is said to be in a positive position if by putting 
the test into disjunctive normal form, then the t = t' is not preceded by a negation. 



true = false is -.(0 :: ::[]:= :: []) V true = false and thus :: :: [] = :: [] 
has a negative position in the test. In particular, Machado in [44I45J considers any 
first order formula whose Skolem form does not contain existential quantifiers. 
Every non-observable equations in positive positions are observed by means of 
observable contexts while those in negative positions are observed by using con- 
crete equality in the implementation. In that sense, Machado's approach is not a 
pure black-box approach deriving test cases and oracles from specifications but 
an approach mixing black-box and white-box where test cases are derived from 
the specifications and the oracle procedure is built from both the specification 
and the lUT. 

We have shown that to deal with axioms with non-observable premises, it is 
not possible to apply observable contexts. However, can we do something else to 
handle such axioms? A tempting solution is to use the specification to recognise 
some ground instances of the axiom for which the specification requires the non 
observable premise to be true. 

Let us come back to the axiom 

X :: X :: I = X :: I ^ true = false 

If it stands alone, nothing can be done to test it. Let us introduce a new 
axiom stating the idempotence law on the element insertion: 

eg(x, y) = true ^ x :: y :: I = y :: I 

Any ground instance oi x :: x :: I — x :: I is then a semantic consequence of the 
specification such that true — false also becomes a semantic consequence. In 
such a case, one would like to consider true — false as a test and even more, it 
seems rather crucial to precisely submit this test! This small example illustrates 
clearly why in this case, tests cannot be only ground instances of axioms but 
shoud be selected among all the observable semantic consequences of the speci- 
fication"'^^ (see the end of Section [2] for the definition of semantic consequence.). 
Let us remark that according to the form of the specifications, one can use the 
unfolding technique described in Section [5] in order to solve the premise in the 
specification. In [41], Le Gall has shown that when the specification is com- 
plete with respect to the set Cobs of constructors of observable sorts and under 

EqExhaust'^^p^ = {c[t\p = c[t']p | 3ai A . . . A a„ ^ t = t' e Ax, p eV ^ Ts, 
c min. obs. context, SP \^ (ai A ... A a„)/9} 

is an exhaustive test set with respect to observational correctness. Curiously, 
whether there are non-observable premises or not in the specification, the corre- 
sponding equational exhaustive test set is not modified. 



^^ Observable semantic consequences are just those semantic consequences that belong 
to Ohs. By construction, selecting a test outside this set would reject at least one 
correct implementation. 



7 Related work 

7.1 Related work on selection 

In [55] , Claessen and Hughes propose the QuickCheck tool for randomly testing 
Haskell programs from algebraic specifications. Axioms are encoded into exe- 
cutable Haskell programs whose arguments denote axiom variables. Conditional 
properties are tested by drawing data until finding a number, given as param- 
eter, of cases which satisfy the premises. Of course, the procedure is stopped 
when a too large number of values is reached. The QuickCheck tool provides the 
user with test case generation functions for any arbitrary Haskell datatype, and 
in particular, also for functional types. The user can observe how the random 
data are distributed over the datatype carrier. When he considers that the dis- 
tribution is not well balanced on the whole domain, for instance if the premises 
are satisfied by data of small size only, it is possible to specialise the test case 
generation functions to increase the likelihood to draw values ensuring a better 
coverage of the domain of premise satisfaction. This last feature is very useful for 
dealing with dependent datatypes. In j7J, Berghofer and Nipkow use Quickcheck 
to exhibit counter-examples for universally quantified formulae written in exe- 
cutable Isabelle/HOL. This is a simple way to rapidly debug formalisation of 
a theory. In [3T|, Dydjer et al. develop a similar approach of using functional 
testing technics to help the proof construction by analysing counter-examples. 

In [18j . Brucker and Wolff use the full theorem proving environment Is- 
abelle/HOL to present a method and a tool HOL-TestCen for generating test 
cases. They recommand to take benefit of the Isabelle/HOL proof engine equipped 
with tactics to transform a test domain (denoted as some proof goal) into test 
subdomains (denoted as proof subgoals). Selection hypotheses are expressed as 
proof hypotheses and the user can interact to guide the test data generation. 
Both the Quickcheck and TestCen tools present the advantage of offering an 
unified framework to deal with the specification, the selection and the genera- 
tion of test cases, and even the submission of the test cases and the computation 
of the test verdict. 



7.2 Related work on observability 

We have given a brief account of observability considerations and their impor- 
tant impact on testability issues. In particular, there does not always exist an 
exhaustive test set, since such an existence depends on some properties of the 
specification and the implementation: namely, restrictions on the specification 
and hypotheses on the implementation. 

The importance of observability issues for the oracle problem as been first 
raised by Bouge ^6\ and then Bernot, Gaudel and Marre in |8|10j . It has been 
studied later on by Le Gall and Arnould [52] and Machado [3144] . Depending on 
the hypotheses on the possible observations and on the form or the extensions 
of the considered specifications, the oracle problem has been specialised. For 
example, in [4], Arnould et al. define a framework for testing from specifications 



of bounded data types. To some extent, bounds of data types limit the possible 
observations : any data out of the scope of the bound description should not be 
observed when testing against such specifications. The set of observable formulae 
are formulae which are observable in the classical sense, where all terms are 
computed as being under the specified bound. 

As soon as partial function are considered in the specification, it must be ob- 
servable whether a term is defined or not. In [3j, Arnould and Le Gall consider 
specifications with partial functions where definedness can be specified using an 
unary predicate def. The specification of equalities are declined with two pred- 
icates, strong equality = allowing two undefined terms to be equal; existential 
equality — for which only defined terms may be considered as equal. As the pred- 
icate — may be expressed in term of = and def, testing from specification with 
partiality naturally introduces two kinds of elementary tests directly related to 
the predicates def and =. Testing with partial fmictions requires to take into 
account the definition predicate: intuitively, testing whether a term is defined 
or not systematically precedes the following testing step, that is testing about 
equality of terms. Some initial results about testability and exhaustive test sets 
can be found in [5]. 

7.3 Variants of exhaustivity 

Most exhaustive test sets presented here are made of tests directly derived from 
the axioms: tests are ground instances of (conclusions of) axioms, some equalities 
being possibly surrounded by observable contexts. Such tests do not necessarily 
reflect the practice of testing. Actually, the usual way of testing consists in 
applying the operation under test to some tuples of ground constructor terms 
and to compare the value computed by the lUT to a ground contructor term 
denoting the expected result. This can be described by tests of the form: 

f{ui,...,Un) =V 

with / the function to be tested, and ui, . . . , u„, v ground constructor terms. The 
underlying intuition is that the constructor terms can denote all the concrete 
values manipulated by the implementation (reachability constraint). To illustrate 
this point of view, in the case of the Containers specification and by considering 
again that the sort Container is observable, for the axiom removeJl, instead of 
testing reniove{2,Z :: []) = 3 :: reniove{2, []) by solving the premise 69(2,3) = 
/a/se, a test of the good form would be remoue(2, 3 :: []) = 3 :: []. Suchatestmay 
be obtained by applying the remove A axiom to the occurrence remove{2, []). In 
particular, LOFT [47I48J computes tests of this reduced form. In [1"3"4', Arnould 
et al. present some exhaustive tests built from such tests involving constructor 
terms as much as possible. 

7.4 The case of structured specifications 

Until now, we have considered flat specifications which consist of a signature, a 
set of axioms, and possibly reachability constraints. Moreover, we have studied 



the distinction between observable and non observable sorts. Observable sorts 
often correspond to the basic types provided by the programming environnement, 
and non observable sorts to the type of interest for the specification. However, 
algebraic specifications may be structured using various primitives allowing to 
import, combine, enrich, rename or forget (pieces of) imported specifications. 
Such constructions should be taken into account when testing. 

As a first step to integration testing of systems described by structured al- 
gebraic specifications, Machado in |45I46| shows how to build a test set whose 
structure is guided by the structure of the specification. The main and signifi- 
cant drawback of this approach is that hidden operations are ignored. As soon 
as an axiom involves an hidden operation, the axiom is not tested. Depending 
on the organisation of the specification, this can mean that a lot of properties 
are removed from the set of properties to be tested. 

In [5S] , Dochc and Wiels define a framework for composing test cases accord- 
ing to the structure of the specification. Their approach may be considered as 
modular since the lUT should have the same structure as the specification and 
the tests related to the subspecifications are composed together. These authors 
have established that correctness is preserved under some hypotheses^* and have 
applied their approach to an industrial case study reported in f2E\ . 

8 Case studies and applications to other formal methods 

This part of the paper briefly reports some case studies and experiments related 
to the theory presented here. Some of them were performed at LRI, some of 
them elsewhere. The first subsection is devoted to studies based on algebraic 
specifications. The next one reports interesting attempts to transpose some as- 
pects of the theory to other formal approaches, namely VDM, Lustre, extended 
state machines and labelled transition systems. A special subsection presents 
some applications to object-oriented descriptions. 

8.1 First case studies with algebraic specifications 

A first experiment, performed at LRI by Dauchy and Marre, was on the on- 
board part of the driving system of an automatic subway^^ in collaboration 
with a certification agency. An algebraic specification was written [26j . Then 
two critical modules of the specification were used for experiments with LOFT: 
the overspeed controller and the door opening controller. These two modules 
shared the use of eight other specification modules that described the state of 
the on-board system. The number of axioms for the door controller was 25, with 
rather complex premisses. The number of axioms of the speed controller was 
34. There where 108 function names and several hundred axioms in the shared 



^* For interested readers, the hypotheses aim at preserving properties along signature 
morphisms and thus, are very close to the satisfaction condition of the institution 
framework. 

^^ precisely, the train controller on line D in Lyon that has been operating since 1991. 



modules. Different clioices of uniformity hypotheses were experienced for the 
door controUer: they led to 230, 95, and 47 tests. For the overspeed controller, 
only one choice was sensible and led to 95 tests. The experiment is reported in 
details in [5S]. In a few words, these tests were used by the certification team as 
a sort of checklist against the tests performed by the development team. This 
approach led to the identification of a tricky combination of conditions that had 
not been tested by the developers. 

A second experiment is reported in [52j and was performed within a collabo- 
ration between LRI and the LAAS laboratory in Toulouse. The experiment was 
performed on a rather small piece of software written in C, which was extracted 
from a nuclear safety shutdown system. The piece of software contained some 
already known bugs that were discovered but one: it was related to some hidden 
shared variable in the implementation, and required rather large instantiations, 
larger than the bound chosen a priori for the regularity hypothesis. On a theoret- 
ical point of view, this can be analysed as a case where the testability hypothesis 
was not ensured. More practically, the fault was easy to detect by "white-box" 
methods, either static analysis or structural testing with branch coverage. This 
is coherent with the remark in Section 2] on the possibility of static checking of 
the testability hypothesis, and with the footnote[4]in Section[5]on the difficulties 
to determine adequate bounds for regularity hypotheses. 

An experiment of "intensive" testing of the EPFL library of Ada components 
was led by Buchs and Barbey in the Software Engineering Laboratory at EPFL 
[6] . First an algebraic specification of the component was reengineered: the sig- 
nature was derived from the package specifications of the family, and the axioms 
were written manually. Then the LOFT system was used with a standard choice 
of hypotheses. 

LOFT has been also used for the validation of a transit node algebraic spec- 
ification [2J. Generating test cases was used for enumerating scenarios with a 
given pattern. It led to the identification of one undesirable, and unexpected, 
scenario in the formal specification. 

It was also used for the test of the data types of an implementation of the 
Two-Phase-Commit protocol [33] without finding any fault: this was probably 
due to the fact that the implementation had been systematically derived from 
a formal specification. Other aspects of this case study are reported in the next 
subsection. 

The specifications and test sets of these case studies are too large to be given 
here. Details can be found in [26] and [25] for the first one, in [2] and [49] for 
the transit node, and in [40] for the Two-Phase-Commit protocol. 



8.2 Applications to other methods 

Actually, the approach developed here for algebraic data types is rather generic 
and presents a general framework for test data selection from formal specifica- 
tions. It has been reused for, or has inspired, several test generation methods 
from various specification formalisms: VDM, Lustre, full LOTOS. 



The foundational paper by Jeremy Dick and Alain Faivre on test case gen- 
eration from VDM specifications [57] makes numerous references to some of the 
notions and techniques presented here, namely uniformity and regularity hy- 
potheses, and unfolding. The formulae of VDM specifications are relations on 
states decribed by operations (in the sense of VDM, i.e. state modifications). 
They are expressed in first-order predicate calculus. These relations are reduced 
to a disjunctive normal form (DNF), creating a set of disjoint sub-relations. Each 
sub-relation yields a set of constraints which describe a single test domain. The 
reduction to DNF is similar to axiom unfolding: uniformity and regularity hy- 
potheses appear in relation with this partition analysis. As VDM is state-based, 
it is not enough to partition the operations domains. Thus the authors give a 
method of extracting a finite state automaton from a specification. This method 
uses the results of the partition analysis of the operations to perform a partition 
analysis of the states. This led to a set of disjoint classes of states, each of which 
corresponds either to a precondition or a postcondition of one of the above sub- 
relations. Thus, a finite state automaton can be defined, where the states are 
some equivalence classes of states of the specifications. From this automaton, 
some test suites are produced such that they ensure a certain coverage of the 
automaton paths. The notion of test suites is strongly related to the state orien- 
tation of the specification: it is necessary to test the state evolution in presence 
of sequences of data, the order being important. 

Test generation from Lustre descriptions has been first studied jointly at CEA 
and LRI. The use of the LOFT system to assist the test of Lustre programs has 
been investigated. Lustre is a description language for reactive systems which is 
based on the synchronous approach fST" . An algebraic semantics of Lustre was 
stated and entered as a specification in LOFT. Lustre programs were considered 
as enrichments of this specification, just as some specific axiom to be tested. 
After this first experience, GATEL, a specific tool for Lustre was developed by 
Marre at CEA (Commissariat a I'Energie Atomique). In GATEL, a Lustre spec- 
ification of the lUT, and some Lustre descriptions of environment constraints 
and test purpose are interpreted via Constraint Logic Programming. Unfolding 
is the basic technique, coupled with a specific constraint solving library [50151] . 
GATEL is used at IRSN (Institut de Radioprotection et Svirete Nucleaire) for 
identifying those reachable classes of tests covering a given specification, accord- 
ing to some required coverage criteria. The functional tests performed by the 
developers are then compared to these classes in order to point out uncovered 
classes, i.e. insufficient testing. If it is the case, GATEL provides test scenarios 
for the missing classes. 

LOTOS is a well known formal specification language, mainly used in the 
area of communication protocols. There are two variants: basic LOTOS makes 
it possible to describe processes and their synchronisation, with no notion of 
data type; full LOTOS, where it is possible to specify algebraic data types and 
how their values can be communicated or shared between specified processes. In 
the first case, the underlying semantics of a basic LOTOS specification is a finite 
labelled transition system. There is an extremely rich corpus of testing methods 



based on such finite models (see [T7] for an annotated bibliography). However, 
there are few results on extending them to infinite models, as it is the case when 
non trivial data types are introduced. In [33j , Gaudel and James have stated the 
underlying notions of testability hypotheses, exhaustive test sets, and selection 
hypotheses for full LOTOS. 

This approach has been used by James for testing an implementation of 
the Two-Phase-Commit Protocol developed from a LOTOS specification into 
Concert/C. The results of this experiment are reported in [3D].As said in the 
previous sub-section, tests for the data types were obtained first with the LOFT 
system. Then a set of testers was derived manually from the process part of the 
specification. The submission of these tests, was preceded by a test campaign of 
the implementations of the atomic actions of the specification by the Concert/C 
library, i. e. the communication infrastructure (the set of gates connecting the 
processes), which was developed step by step. It was motivated by the testability 
hypothesis: it was a way of ensuring the fact that the actions in the implementa- 
tion were the same as in the specification, and that they were atomic. No errors 
were found in the data types implementations, but an undocumented error of the 
Concert/C pre-processor was detected when testing them. Some errors were dis- 
covered in the implementation of the main process. They were related to memory 
management, and to the treatment of the time-outs. There are always questions 
on the interest of testing pieces of software, which have been formally specified 
and almost directly derived from the specification. But this experiment shows 
that problems may arise: the first error-prone aspect, memory management, was 
not expressed in the LOTOS specification because of its abstract nature; the 
second one was specified in a tricky way due to the absence of explicit time 
in classical LOTOS. Such unspecified aspects are unavoidable when developing 
efficient implementation. 

8.3 Applications to object-oriented software 

It is well known that there is a strong relationship between abstract data types 
and object orientation. There is the same underlying idea of encapsulation of the 
concrete implementation of data types. Thus it is not surprising that the testing 
methods presented here for algebraic specifications has been adapted to the test 
of object oriented systems. We present two examples of such adaptations. 

The ASTOOT approach was developped by Dong and Frankl at the Poly- 
technic University in New- York [3D]. The addressed problem was the test of 
object-oriented programs: classes are tested against algebraic specifications. A 
set of tools had been developed. As mentioned at the end of Section [4l a dif- 
ferent choice was made for the exhaustive test set, which is the set of equalities 
of every ground term with its normal form, and it was also suggested to test 
inequalities of ground terms As normal forms are central in the definition of 
tests, there was a requirement that the axioms of the specification must define a 
convergent term rewriting system. Moreover, there is a restriction to classes such 
that their operations have no side effects on their parameters and functions have 
no side effects: it corresponds to a notion of testability. The oracle problem was 



addressed by introducing a notion of observational equivalence between objects 
of user-defined classes, which is based on minimal observational contexts, and 
by approximating it. Similarly to Section [5l the test case selection was guided 
by an analysis of the conditions occuring in the axioms; the result was a set of 
constraints that was solved manually. The theory presented here for algebraic 
data types turned out to nicely fit to cope with object-orientation, even when 
different basic choices were made. 

This had been confirmed by further developments by Tse and its group at 
the university of Hong Kong [20121155] . In their approach, object-oriented sys- 
tems are described by algebraic specifications for classes and contract speci- 
fication for clusters of related classes : contracts specify interactions between 
objects via message-passing rules. As in our approach, some tests are fundamen- 
tal pairs of equivalent ground terms obtained via instantiations of the axioms. As 
in ASTOOT non equivalent pairs of terms are also considered. Some white-box 
heuristic for selecting relevant observable contexts makes it possible to determine 
whether the objects resulting from executing such test cases are observationally 
equivalent. Moreover, message passing test sequences are derived from the con- 
tract specification and the source code of the methods. This method has been 
recently applied for testing object-oriented industrial software [55j . 

9 Conclusion 

Algebraic specifications have proved to be an interesting basis for stating some 
theory of black-box testing and for developing methods and tools. The under- 
lying ideas have turned out to be rather general and applicable to specification 
methods including datatypes, whatever the formalism used for their description. 
It is the case of the notions of uniformity hypothesis, and regularity hypotheses 
that have been reused in other contexts. 

In presence of abstraction and encapsulation, the oracle problem raises dif- 
ficult issues due to the limitations on the way concrete implementations can 
be observed and interpreted. This is not specific to algebraic specifications and 
abstract data types: the same problems arise for embedded and/or distributed 
systems. It is interesting to note the similarity between the observable contexts 
presented here, and the various ways of distinguishing and identifying the state 
reached after a test sequence in finite state machines [l^, namely separating 
families, distinguishing sequences, characterising sets, and their variants. 

The methodology presented here has been applied, as such or with some 
adjustments, in a significant number of academic or industrial case studies. In 
most cases, they have been used for some a posteriori certification of critical 
systems that had already been intensively validated and verified, or for testing 
implementations that have been developed from some formal specification. This 
is not surprising: in the first case, the risks are such that certification agencies are 
ready to explore sophisticated methods; in the second case, the availability of the 
formal specification pushes for using it for test generation. In both circumstances, 
it was rather unlikely to find errors. But some were discovered however, and 



missing test cases were identified. In some cases, these detections were quite 
welcome and prevented serious problems. This is an indication of the interest of 
test methods based on formal specifications, and of the role they can play in the 
validation and verification process. 
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